Recognizing Objects and Scenes in News Videos
نویسندگان
چکیده
We propose a new approach to recognize objects and scenes in news videos motivated by the availability of large video collections. This approach considers the recognition problem as the translation of visual elements to words. The correspondences between visual elements and words are learned using the methods adapted from statistical machine translation and used to predict words for particular image regions (region naming), for entire images (auto-annotation), or to associate the automatically generated speech transcript text with the correct video frames (video alignment). Experimental results are presented on TRECVID 2004 data set, which consists of about 150 hours of news videos associated with manual annotations and speech transcript text. The results show that the retrieval performance can be improved by associating visual and textual elements. Also, extensive analysis of features are provided and a method to combine features are proposed.
منابع مشابه
Classification of TV sports news by DCT features using multiple subspace method
This paper proposes a method to classify automatically TV sports news articles using image processing and classification techniques. The classification algorithm of TV sports news articles is based on a multiple subspace method that provides a sports category with more than one subspaces corresponding to the typical scenes. The classification is performed without recognizing any objects in an i...
متن کاملOnline multiple people tracking-by-detection in crowded scenes
Multiple people detection and tracking is a challenging task in real-world crowded scenes. In this paper, we have presented an online multiple people tracking-by-detection approach with a single camera. We have detected objects with deformable part models and a visual background extractor. In the tracking phase we have used a combination of support vector machine (SVM) person-specific classifie...
متن کاملOn the usefulness of attention for object recognition
Today’s object recognition systems have become very good at learning and recognizing isolated objects or objects in images with little clutter. However, unsupervised learning and recognition in highly cluttered scenes or in scenes with multiple objects are still problematic. Faced with the same issue, the brain employs selective visual attention to select relevant parts of the image and to seri...
متن کاملFinding Person X: Correlating Names with Visual Appearances
People as news subjects carry rich semantics in broadcast news video and therefore finding a named person in the video is a major challenge for video retrieval. This task can be achieved by exploiting the multi-modal information in videos, including transcript, video structure, and visual features. We propose a comprehensive approach for finding specific persons in broadcast news videos by expl...
متن کاملStatistical Background Modeling Based on Velocity and Orientation of Moving Objects
Background modeling is an important step in moving object detection and tracking. In this paper, we propose a new statistical approach in which, a sequence of frames are selected according to velocity and direction of some moving objects and then an initial background is modeled, based on the detection of gray pixel's value changes. To have used this sequence of frames, no estimator or distribu...
متن کامل